Acquiring Bilingual Lexica from Keyword Listings
نویسندگان
چکیده
In this paper, a new method for acquiring bilingual dictionaries from on-line text corpora is presented. The method merges rule-based techniques for obtaining dictionaries from structuralised data, such as paper dictionaries (in electronic form) or on-line glossaries, with methods used by aligning tools, such as GIZA. The basic idea is to search for anchor words such as abstract or keywords followed by their equivalents in another language. Text fragments that follow anchor words are likely to supply new entries for bilingual lexica.
منابع مشابه
Creating bilingual lexica using reference wordlists for alignment of monolingual semantic vector spaces
This paper proposes a novel method for automatically acquiring multilingual lexica from non-parallel data and reports some initial experiments to prove the viability of the approach. Using established techniques for building mono-lingual vector spaces two independent semantic vector spaces are built from textual data. These vector spaces are related to each other using a small reference word li...
متن کاملTowards producing bilingual lexica from monolingual corpora
Bilingual lexica are the basis for many cross-lingual natural language processing tasks. Recent works have shown success in learning bilingual dictionary by taking advantages of comparable corpora and a diverse set of signals derived from monolingual corpora. In the present work, we describe an approach to automatically learn bilingual lexica by training a supervised classifier using word embed...
متن کاملInducing Bilingual Lexica From Non-Parallel Data With Earth Mover's Distance Regularization
pages 3188–3198, Osaka, Japan, December 11-17 2016. Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover’s Distance Regularization Meng Zhang†‡ Yang Liu†‡ Huanbo Luan† Yiqun Liu† Maosong Sun†‡ †State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua Univers...
متن کاملBilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision
Building bilingual lexica from non-parallel data is a longstanding natural language processing research problem that could benefit thousands of resource-scarce languages which lack parallel data. Recent advances of continuous word representations have opened up new possibilities for this task, e.g. by establishing cross-lingual mapping between word embeddings via a seed lexicon. The method is h...
متن کاملExtracting Bilingual Lexica from Comparable Corpora Using Self-Organizing Maps
This paper aims to present a novel method of extracting bilingual lexica from comparable corpora using one of the artificial neural network algorithms, self-organizing maps (SOMs). The proposed method is very useful when a seed dictionary for translating source words into target words is insufficient. Our experiments have shown stunning results when contrasted with one of the other approaches. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009